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Abstract 

A  large  body  of  literature  exists  on  the  techniques  for  selecting  the  important  variables 
in  linear  regression  analysis.  Many  of  these  techniques  are  ad  hoc  in  nature  and  have  not 
been  studied  from  a  theoretical  viewpoint.  'In  this  paper  we  discuss^some  of  the  more 
commonly  used  techniques  and  propose  a  selection  procedure  based  on  the  statistical 
selection  and  ranking  approach.  This  procedure  is  easy  to  compute  and  apply.  The 
procedure  depends  on  the  goodness  of  fit  of  the  model  and  the  total  error  associated  with 
it. 


1.  Introduction 

Tho  problem  of  determining  the  important  (“best”)  subset  of  independent  variables 
has  long  been  of  interest  to  applied  statisticians:  primarily,  because  of  the  eiirrc'iil  avail¬ 
ability  of  high-speed  computations,  this  problem  has  received  considerable  attention  in  the 
recent  statistical  literature.  Hocking  (1976)  has  given  an  excellent  survey  of  the  existing 
techniques.  Several  other  papers  have  dealt  with  various  aspects  of  the  problem  but  it 
appears  that  the  typical  regression  user  has  not  benefited  appreciably.  One  reason  for  the 
lack  of  resolution  of  the  problem  is  the  fact  that  it  has  not  been  well  defined.  For  the 
procedures  that  we  usually  discussed  in  textbooks,  the  probability  of  a  correct  selection  is 
not  guaranteed. 

The  problem  of  selecting  a  subset  of  independent  or  predict  variables  is  usually  de¬ 
scribed  in  an  idealized  setting.  That  is,  it  is  assumed  that 

(1)  the  analyst  has  data  on  a  large  number  of  potential  variables  which  include  all  relevant 
variables  and  appropriate  functions  of  them  plus,  possibly,  some  other  extraneous 
variables  and  variable  functions  and 

(2)  the  analyst  has  available  “good”  data  on  which  to  baise  the  eventual  conclusions. 

The  analysis  of  residuals  (see  Draper  and  Smith  (1981))  may  reveal  different  functional 
forms  which  might  be  considered  and  may  even  suggest  variables  which  arc  not  initially 
included.  We  tissume  that  the  process  for  model  building  has  been  completed  and  tho 
resulting  models  are  true.  The  problem  is  to  determine  an  “appropriate”  regression  model 
based  on  a  subset  of  the  original  set  of  variables.  In  this  problem  there  are  three  ingredients, 
namely, 

(a)  tho  computational  technique(s)  used  to  provide  the  information  for  the  analysis, 

(b)  the  criterion  used  to  analyze  the  variables  and  select  a  subset,  if  that  is  appropriate-, 
and 

(c)  the  estimation  of  the  coefficients  in  the  final  equation,  (cf.  Hocking  (1976,  1983)). 

In  this  paper,  we  study  this  problem  from  the  viewpoint  of  statistical  ranking  and 
■se-loction  to  investigate  some  selection  criteria.  From  this  approach  wo  can  obtain  sonio 
u.soful  |>rocodures  to  select  important  regression  variables. 


In  these  studies,  we  have  found  that  the  reduced  models  are  hiust'd  on  noii<  <  ni  i  i  l  i  \ 
parameters  which  provide  a  measure  of  goodness  of  fit  for  the  fitted  tnod<  l>  W*  .d  . 
propose  a  statistic  to  measure  the  standardized  total  square  error,  and  study  t  in-  di  ii  <  i  ion 
of  bias  for  the  fitted  model.  The  statistic  we  propose  is  an  unbiased  estimalor  wlm  h  i 
different  from  Mallows’  Cp  statistic.  Based  on  this  statistic,  a  two-stage  selection  pi<><  (duo 
is  proposed  and  studied. 

Finally,  we  mention  that  we  have  shown  the  relation  of  the  noncentrality  piiraniel(  r 
and  the  statistic  we  proposed.  We  should  use  both  of  them  to  select  a  good  fit  .md  Ic 
biiis  models  and  at  the  same  time,  the  total  square  error  is  also  to  be  made  as  Miiail  i 
possible.  An  asymptotic  result  is  also  studied  to  determine  the  value  A  of  the  l)ia  .  I  In 
asymptotic  result  enables  us  to  determine  at  least  how  many  regression  varial)l<  s  uill  In 
neglected. 

2.  Some  Selection  Criteria 

Consider  the  usual  linear  model 

Y  =  XI  +  £  (2.1) 

where  Y_'  —  [Ki,. ..  ,ynl  is  an  1  x  n  vector  of  a  random  sample,  X  —  |l,  X] . i 

an  n  X  p  matrix  of  known  constants,  P'  =  \0o,Pi,. . .  is  a  1  x  p  vector  of  iinknou  n 

parameters  and  £  N{0,  o^In).  Here  /„  denotes  the  identity  matrix  of  order  n  n 

I'he  model  (2.1)  having  p  —  1  independent  variables  is  considered  as  the  true  mode  l.  \ii\ 
reduced  model  whose  “X  matrix”  has  r  columns  is  obtained  by  retaining  any  r  I  of  iIh 
p  1  independent  variables  Xi,...Xp_i,  where  2  <  r  <  p.  For  each  r,  2  <  r  •  p,  tluMi 
ir<'  (((  J)  such  models.  These  kr  reduced  models  of  “size”  r  are  inde'xc'fl  ;u  l>it  r.n  il\ 

with  the  indexing  variable  t  going  from  1  to  kr-  We  will  refer  to  a  typical  model  as  Modi  I 
Afro  A  reduced  model  of  size  r  can  be  written  as 

£;(y)  -  X,,  i=i,2,...,kr  (2  21 

where  Xn  and  are  obtained  from  X  and  /?  corresponding  to  the  varial)l<'s  tli.ii  an 
retained  in  the  model. 

It  should  be  pointed  out  that  all  expectations  an<l  probabilities  are  c  ab  ulaled  iiudi  i 
the  true  model  (2.1). 


Usually,  we  use  the  residual  sum  of  squares  to  measure  goodness  of  the  fitted  model  for 
a  random  sample.  Hence,  the  expected  residual  sum  of  squares  is  naturally  considered  a> 
the  measurement  for  the  goodness  of  fit.  Large  values  of  this  expectation  are  not  desirable. 
It  should  be  first  noted  that  our  comparisons  of  models  are  made  under  the  true  model 
assumptions. 

For  any  r,  2  <  r  <  p,  the  residual  sum  of  square  SSri  for  the  reduced  model 
1  <  i  <  kr,  is  as  follows 

•55,.  =  Y'\I  -  Xr,{XUXri)-^Xl,]Y 

=  Y'Q„Y,  (2.3) 


/here  g,.  -  |/  XV].  Also, 


where  the  degrees  of  freedom  Ur  -  n  -  r,  and  the  noncentrality  parameter 

A,,  =  {X0)'  Qri{X§)l2ol. 

We  note  that  Q,,  is  idempotent  and  symmetric;  thus  it  is  positive  serni-dcfinitc.  lleiid 
A,,  is  nonricgative,  but  not  zero,  in  general. 


We  have 


=  Ur  aQ  +  2(70 


Since  o'o  is  fixed,  it  is  clear  from  (2.5)  that  A,,  should  not  be  large  for  a  good  model. 

We  define  any  reduced  model  with  associated  noncentrality  parameter  A  ,  inferior  it 
A,,  --  A  where  A(>  0)  is  a  specified  constant.  Our  goal  is  to  eliminate  all  inferior  rnodeb 
from  the  set  of  2^“*  -  1  regression  models  including  the  true  model. 

The  residual  sum  of  squares  for  the  full  model  is  denoted  by  SSp\.  Then, 

£^l55p,/(n-p)l  =  05. 

Hence,  we  use  55,,i/(n  p)  to  estimate  Oq,  and  denote 


2 


Now,  let  and  denote  the  multiple  correlation  coefficients  of  the  iii()<i<'ls  (  J.l) 
and  (2.2),  respectively.  Hence 


R^^l- 


_ SSpi _ 

(Z -£)'(>!-£)’ 


and 


SSr^ 

(£-£)'(£-£)’ 


where  Y_'  =  {Y,...,Y)  and  V  =  ^  £,"=i  Z-  From  (2.5),  we  propose  A  r,  as  an  est  i  tnal  nr 
of  Xri  where 


-  _  n  —  p  SSri  Ur 

""  2  SSp,  “  T 

_  n- pi  -  R'^i  Ur 

~  ~2~  I -R^  2  ■ 


(2.(i) 


Proposed  Selection  Procedure. 

We  propose  the  selection  rule  S  as  follows: 

Exclude  (reject)  the  reduced  model  Mri 

iff  Afj  ^  dj"t 

where  drt  is  determined  by  inf P{£i  >  drt)  =  P*,0  <  P*  <  1.  It  can  be  shown  that  tin 
following  are  equivalent  forms: 


2'  -  ^  {1-R^) 


(1  -  p2.)  >  ^dri  +  ^) 
(SSr 


2  n  —  p 
SSpi)/{p  -  r) 


■5-5pi/(«  -  P) 


> 


2 


n 


(2.<) 

(2.10 


Hence,  the  correct  decision  of  excluding  all  inferior  models  Mn  under  tin'  f;iiat.iiii('(  t 
probability  P*  is  equivalent  to 


It  is  well  known  that  the  distribution  of  the  statistic 

(S5H-55pi)/(p-r) 

”  SS^yKn-p) 

follows  the  noncentral  F  distribution  denoted  as  F'{p  —  r,n  —  p,Xri)  (cf-  Gray  bill  (197())) 
Thus  the  critical  value  in  (2.11)  can  be  computed  as  follows: 

inf  P{Vri  >  Dr^}  =  P*  (212). 

A 

From  Ghosh  (1973),  the  noncentral  F  distribution  is  stochastically  decreasing  in  Thus 
wo  ran  compute  through  the  following  equ- ' 

P{V,,>L  .  ^A}^P\  (2.13) 

whore  1)^  (d„  ■  ^);^  -  1  as  in  (2.10). 

Now,  we  rewrite  the  selection  procedure  S  as  follows. 

Theorem  1.  The  selection  procedure  S  is  equivalent  to  the  following: 
rOxolude  Mri  as  an  inferior  model 

iff  Vri  >  Dri 

whore  Dr,  depends  on  A,n,p,r  and  P*  and  is  chosen  to  satisfy 

P{Vr,>  D„\Xr,  =  A}=  P\ 

Total  Sfpiared  Error  as  a  Criterion  for  Goodness  of  Fit. 

A  measure  of  “total  squared  error”  was  first  given  by  Mallows  (1973).  He  used  th(' 
statistic,  called  Cp,  to  measure  the  sum  of  the  squared  biases  plus  the  sciuared  randoni 
errors  in  Y  at  all  n  data  points.  Daniel  and  Wood  (1980)  described  the  problem  as  follows. 
The  total  squared  error  (bias  plus  random)  for  n  data  points,  using  a  fitted  tnod<>l 

n  ^ 

Mr,  with  r  terms,  is  ~  i-®-’ 

;  =  i 


(2.M) 


ivhere 


Uj  —  u{X\jy  X-iji . . .)»  expected  value  from  true  equation, 

r— 1 

Vij  —  Po  Y1  Pl^itt  expected  value  from  the  fitted  model  Mri  being  uso<l, 

«=i 

(t/,j  —  Tjij)  =  bias  at  the  jth  data  point,  and 

y,  =  (Vii,  •  •  • ,  Vin)'  =  Xri(X^^Xri)~^  X'^Y^  is  the  predicted  value  under  leasi 
estimate  in  the  reduced  model  Mri- 


s(|ii.n< 


For  convenience,  let  SSBri  stand  for  52  Wj  ~  Vij)^  a-^d  define  a  quantity.  IV,,,  ilic  ^lan 

y=i 

dardized  total  squared  error,  by 


Tri  = 


SSBri  1 


-  +  -2  X]  var  (F.j). 


Since 


we  have 


Y,  =  XriiXrXri)-^X'r,Y, 

cov  (y,) 

=  E{Xri{X'riXri)-^XUY-Xri{X'riXri)-^X'riXP}x 
[Xri{XUXri)-^XUY-Xr^{XUXri)-^X'r,XP\> 


Hence,  we  obtain 


=  a^Xri{X'riXri)-^X'ri. 


^  var  (4)  =  <,1  tr  X„(XUX,iy'XU 


SSBri  =  tiWi  -  =  E  [b(>',)  - 

3=1  j=\ 

=.  {£;([/  -  x,,(x;,x..r'x„iyo}'  {£;([/  -  x..(x:.a%.) - ' a%. i v) } 

-  {X0'\I  -  Xri{X>r^Xr,)-^XU\(XP) 


—  0^2 


2aoXru 


/';(.S\SV.)  -  E(Y  -  Y^'{Y  -  r.) 

E{Y'\1  ~  X„{XlM^^XU]Y} 

^  E{  tr  (rV  -  Xr,{Xl,X„r^Xr]\:)} 

-  ^r{E\iY  X^)'(/- X,.(X:.X,.)^'X:.)(V:  X/^)j 

tr  {[/-X„(X;.X„)-‘x:.ja2} 

+  {XPYil  -  Xr,(XUXr^)-'Xl,\{X0)} 

(n  r)<7^^  +  2£t^A^,. 


I''r<)iti  (2.17)  and  (2.18),  wo  have 


EiSSr,)  ^  SSBrt  +  (n  --  r)ol. 


I■'r()m  (2. 1(1)  and  (2.19),  w'o  can  rewrite  (2.15)  as 

,  i?(55V.) 


--V^-(n-2r) 

-  Ur  -t  2Xri  -  [n  -  2r) 

—  2 Art  “h 


Wo  have  an  unbiased  ostiinate  of  Fri  as  follows; 

b  ..  -  P  2  r  . 


I'r,  2  •  — - -  2Ari  +  (n  -  r)  {2p  ."Ir) 

n  -  pi 


2Ar,  1  (n  r)  =  (p  r)Vr 


loiK  i'.  we  (  an  show  that  for  ri  p  ■  2,  we  have 

^  P  2  {(p  r)rAr,}(n  p) 

n  P  (P  -  »')(n  p  2) 

2A,,  ■  r  I’r. 


(2p  ;ir) 


I  ro.'ii  (2..'.),  v\o  .SIC  Ihal  is  a  inoa.snro  of  the  error.  'Fhat  is.  Ar,  is  u-i  <1  i.i  luc 
lit  III  s  of  III!'  roiliKod  model  .\/r,.  If  it  is  a  good  fit  then  An  ~  O-  'I'hen  from  ( 


(Note  the  notation  means  “approximately  close".)  We  require  the  total  sfni.io'  cmum  o. 
he  small  for  good  fit.  Hence  f%,  should  be  as  small  as  possible  and  close  to  r.  lli m  i  ' 

11' '  I Id  Ix'  as  small  as  possible. 

\\V  ''unimarize  these  results  in  the  following  theorem. 


riieorein  2.  'Fhe  total  squared  error  (Idas  plus  random)  for  n  data  points,  using  .i  lit'i'd 
model  .\/r,  with  r  terms,  as  defined  by  Mallows  1 1973)  (see  also  Daniel  and  Wood  j  IdsO)  '; 


V(V'  ^/_,)^ 


j  I 


Wlif'Te  I' 


n  p 


.  •) 


/•;(V,)  and  V,  (V., . r.„)'  .V..(-V;.A'r.)  Now  fro, 


r,,  2  - — — +  (n  r)l  -  (2p  3r) 


n  2.21  !.  ,1 


n  p 

!s  an  unbiased  estimator  of  the  standardized  total  squared  error  IV,.  .Also  il  H,  i). 
'  Inn  I'rz  ~  r. 


Tlie  Relation  between  and 

1  rotn  (2.7)  and  (2.21),  we  have 

I 

Im  2(n  p  2)-^ - {2n  3r  l).  12  2 

Hocking  (1976)  pointed  out  that  the  /?r,  plot  may  be  quite  flat  for  a  givi  "  ’'.m  n  ■ 

’  b.' eoi'fticient  {n  p  2)  can  magnify  small  rlifference'^  causing  IV,  t  o  lncre.,'.i'  d  i  .i  ■ :  i,:  ’  a  .  1 1 
/  : '  do< Teased. 

The  Relation  between  /'-statistic  \  r,  and  fV, 


f  rom  Theorem  2.  vve  have 


2{n  p  2)(p-r),. 

- V 

n  p 


(2p  3r). 


1 1 


The  Relation  V)etween  Mallows'  C,,  and  IV, 


Mallows'  CV,  is  defined  as  follows: 


Hence 


Cri  ^  (n  -  r)VV,  -  (n  -  2r). 


2(n  p  2){p  r) 
{n  p)(n  r) 


\Crt  +  (n  -  2r)j  -  (2p  -  3r). 


3.  A  Two-Sfagt'  Selection  Procedures  Rg 

Now  w('  (>ro[)ose  llu'  following  selection  procedure  which  depends  on  lh('  procedure 


R,:  At  stage  1,  apply  the  selection  procedure  S  to  select  some  desirable  reduced 
inodids  denoted  by  the  set  T.  At  stage  2,  from  the  set  T,  we  select  the  reduced 
model  associated  with  the  smallest  Fn. 

From  (2.8),  (2.9),  and  (2.10),  we  see  that  the  following  selection  rules  Si  and  .SN  are 
all  erpii valent  to  .S’. 

.S'l:  select  model  A7r,  if  (1  -  >  di(l  —  7?^); 

.S'^:  select  model  if  Vri  >  <^2; 


whore  di  and  d-^  depend  on  n,p,r,  i  and  P*. 

(iupta,  Huang  and  Chang  (1984)  have  studied  some  optimal  properties  of  .S’^.  Huang 
and  I’anchapakesan  (1982)  have  studied  some  selection  procedures  related  to  .S',.  .S’._,  can 
be  used  in  the  stepwise  regression  analysis.  Also  Si  can  be  used  for  analysis  of  all  possibli 
regression  modeds. 

From  the  previous  discussions,  one  can  use  S2  to  compute  the  critical  vahu's  r/j  to 
decidi-  the  a<  ceptanc('  or  rejection  of  the  reduced  models.  From  the  selected  models  \\( 
<  hoose  a  suitable  one  by  plotting  frj  against  r  with  fr,  as  small  as  possible  (see  'fheoK  tu 
2).  It  follows  from  the  fact  that  SSDr,/0(^  -  2Ar,,  that  the  large  values  of  A^,  inea-^uK 
tin'  degr<'e  of  the  departure'  from  the  line  Fr,  =  r. 


( 'oin[>ut  :it  ion  of  (N)iistants  Pn 

Fat  naik  ( 1 949)  provided  an  approximation  to  the  noncentral  F  distribut  ion  (cl .  Cuen- 
ther  (1979))  by  the  relation 


F{pi,P2^>')  ~  !(Pi  F  2A)/p,)F(p*,P2) 


where 


P*  =  (Pi  +  2A)^/(pi  +  4A). 


Hence,  we  can  determine  Dri  from  the  following  (approximation)  equation  (see  'I'heorc-tn 

I)-- 


where  P*  =  ‘\Vr)?4A'  ^  ^(P*  ,n  —  p)  is  the  statistic  which  follows  the  central  /' 

distribution  with  p*  and  n  —  p  degrees  of  freedom. 

Ghosh  (1973)  has  shown  that  P{f^(pi,P2)  >  c}  is  monotone  decreasing  in  p,  and 
increasing  in  P2- 

Thus  we  can  use  interpolation  method  to  obtain  the  critical  value  c  =  F[p‘ ,  n  p;  P' ) 
by  noting  the  fact  that  F{p' ,n  —  p;  P’)  =  l^(»i  “  PiP*;  1  “  ^”)r  '  •  that 

Dr.  =  ^P~-l^~.^F{p\n-  p-P‘). 
p  -  r 

Asymptotic  Results  for  Rg 

Note  that  procedure  Rg  at  the  first  stage  satisfies  (2.13).  Suppose  we  want  to  detc'i- 
minc  the  (minimum)  number  of  independent  variables  to  be  chosen  for  a  specified  value 
of  A.  Assuming  the  sample  size  n  to  be  sufficiently  large,  we  study  the  asymptotic  rescdls 
for  the  two-stage  selection  procedure  Rg.  Let  n  -  p  >  4. 

P'  =  P{Vr,  >  DrMr,  =  A} 

_  r.,yr.-E{Vr.)  ^  Dr.  -  E{Vr.)  |  ,  _ 

^  VVar(l/„)  ^  x/Var(V;.)  ■' 
w  P{Z  >  o}  1  -  $(a), 

where  ^►(•)  is  the  standard  normal  distribution  function, 


where  p 


E{Vr,)  = 


(p  -  r  +  2A)(n  -  p) 
(n  --  p  -  2)(p  -  r) 


V.r(V  1  =  2(n  -  p)^  ((p-r)  +  2A)^  (P "  r)  +  AA 

(p  -  r)2(n  -  p  -  2)  (n  -  p  -  2)(n  -  p  -  4)  n  p  I 


"  •  *  •  ■  ■  *  •  *  •  *  •  *  ^  ’  ■  ’  •  *  •  *  •  •  •  '  ^  '  •  *  •  *  *  'j'  'j*  *  •  *  •  'j*** ^  '  •  ■  •  ^  »*•  ,*• 


For  a  fixed  random  sample,  we  have 

2(n  -p-2)(p-r) 


Tri 

Now,  we  rewrite  f  „  as  follows 
2(n-p-2)(p-r) 


n  —  p 


Vri  -  (2p  -  3r). 


Tr.  - 


n  —  p 


v/Var(W.) 


v/Var(l/„) 


+  EiVr,)\  {2p  3r). 


Wo  arc  trying  to  minimize  the  following  function  f"  with  f“  <  1%,  to  obtain  an  up[)or 
bound  of  A,  for  the  given  value  p  —  r  =  i  and  a  <  0. 

I, el 

2(n  -  p  -  2)(p  -  r) 


'"A 


n  ~  p 


{"1 


2(n  -  p)2 


(p-r  +  2A)2 


p  -  r  1  4  A 

(p  -  r)^(n  -  p  -  2)  Y  -  P  -  2)(n  -  p  -  4)  n  p  1 
(p  -  r  +  2A)(n  -  p) 


-(2p-3r) 


(n  -  p-2)(p-r) 

\/2Da\Ax^  +  Bx  +  C)^  —  x  +  4A  +  p. 


whore 


A  - 
B  - 

C  -- 


(n  -  p)^ 


(n  -  p  -  2)2(n  -  p  -  4)’ 
4A(n  -  p)^ 


+ 


(n  -  p)  = 


(n  p  -  2)2(n  -  p  -  4)  (n  -  p  -  4)(n  -  p  2)  ’ 
4A^(n  -  p)^  ,  4A(n  -  p)^ 


+ 


and 


(n  p-2)2(n-p-4)  (n  -  p  -  4)(n  -  p  -  2)  ’ 
2(n  -  p  -  2) 


n  -  p 


Since  /i  ss  0,  /?  «  1,  C  w  4 A  and  D  w  2,  hence,  f“  «  2\/2a  -r  4A  x  »  p  -  lA. 

I A  cl  A  a  A 

By  lotting  0,  we  have  A  «  2a^  -  i,  such  that  >  0.  {fence,  I';*  is  rninimi/od 

when  A  2n^  x.  For  which,  we  can  find  an  upper  bound  of  A  such  that  at  least  tiow 
many  variables  are  excluded  for  this  bound,  since  A  is  decreasing  in  x;  see  tlio  following 
oxam()lo. 


Example: 


11 


Of 


We  use  the  Hald  data  (Draper  and  Smith,  1981,  Appendix  B,  page  629)  to  discns.s 
the  procedure  as  follows. 


No. 

ATi 

ATj 

Xs 

X4 

V 

1 

7 

26 

6 

60 

78.5 

2 

1 

29 

15 

52 

74.3 

3 

11 

56 

8 

20 

104.3 

4 

11 

31 

8 

47 

87.6 

5 

7 

52 

6 

33 

95.9 

6 

11 

55 

9 

22 

109.2 

7 

3 

71 

17 

6 

102.7 

8 

1 

31 

22 

44 

72.5 

9 

2 

54 

18 

22 

93.1 

10 

21 

47 

4 

26 

115.9 

11 

1 

40 

23 

34 

83.8 

12 

11 

66 

9 

12 

113.3 

13 

10 

68 

8 

12 

109.4 

Daniel  and  Wood  (1980,  p.  89)  have  computed  Cr,’s  for  all  equations.  Using  ilieii 

A 

values,  we  compute  Frj  as  follows: 


Variables  in  Equation 

r 

Cri 

Tri 

Vr. 

AT, 

2 

202.7 

82.6 

19.25 

2 

142.6 

58.0 

13.78 

3 

2.7 

1.9 

0.97' 

ATa 

2 

315.3 

128.6 

29.48 

a:,,X3 

3 

198.2 

60.6 

20.52 

a:2,X3 

3 

62.5 

19.9 

6.95 

AT] ,  Xi,  X3 

4 

3.0 

3.3 

0.89 

X4 

2 

138.8 

56.5 

13.44 

X„x, 

3 

5.5 

2.8 

1.25 

X2,X4 

3 

138.3 

42.6 

1.45 

XuX2,X4 

4 

3.0 

3.3 

0.89' 

X3,X4 

3 

22.4 

7.8 

2.94 

Xl,X3,X4 

4 

3.5 

3.4 

0.94 

X2,X3,X4 

4 

7.3 

4.1 

1.37 

Xi,X2,X3,X4 

5 

5.0 

5.0 

1.00 

As  an  illustrative  example,  we  compute  some  D,.,’s  in  (3.3)  for  P' 
as  follows:  n  13,  p  -  5. 


0.90.  and  A  3 


0.939 


1.1106 


1.647 


Now  we  apply  the  procedure  Rg.  At  stage  1,  we  exlude  all  inferior  reduced  models. 
I  Ills  result.s  in  the  selection  of  the  models  marked  *.  Thus,  we  retain  the  following  reducc'd 


models; 


rv,,.v-},  {X,,X2,X4},  {X,,X3.X4},  and  {A'2..V:,..V,}. 

Ik  .s(>  iiliovc  arc'  t  he  desirc'd  reduced  models.  Then,  we  use  1%,  versus  r  plot; 


(1  ,2,3,4) 


•/  (2,3,4) 


•  (1,3,4) 

■  (1,2,3)  (1,2,4) 


(1,2) 


•  nteans  a  single  point. 


iiu'ctns  a  double  point. 


.  - 


I'Vdm  this  plot,  we  see  that  the  reduced  mode!  {Xi,.Y2}  ip  our  desirt'd  model.  Nnh 
tli.i;  after  the  first  stage,  we  can  state  with  confidence  probability  /*'  .tK)  tliai  all  m  I ci 
ro'  lU  ls  ;  ■')  except  the  5  reduced  models  given  above  ~  are  inferior  and  have  le  i  n 

(■  V  I  !  n  d  e(  i 

W'  also  note  that  tfie  largest  value  of  r  for  the  selected  models  is  4.  If  we  take  I  a  n: 

;  luiiind  of  r  to  start  with,  then  an  approximate  npp)er  bound  of  A  (an  be  dlii.n:  <  ! 
:  '  t|ii.  asymptotic  relation  A  ^  2(i^  (p  r).  l-'or  /’’  O.QO,  we  get  d  I  Js'd  .I'm 

;  ;  (  ! ,  i I  )'  (■  A  5;  2.29. 
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